Implementation of Atomic Primitives on Distributed Shared Memory Multiprocessors
نویسندگان
چکیده
In this paper we consider several hardware implementations of the general-purpose atomic primitives fetch and Φ, compare and swap, load linked, and store conditionalon large-scale shared-memory multiprocessors. These primitives have proven popular on small-scale bus-based machines, but have yet to become widely available on large-scale, distributed shared memory machines. We propose several alternative hardware implementations of these primitives, and then analyze the performance of these implementations for various data sharing patterns. Our results indicate that good overall performance can be obtained by implementing compare and swap in the cache controllers, and by providing an additional instruction to load an exclusive copy of a cache line.
منابع مشابه
Scalable Atomic Primitives for Distributed Shared Memory Multiprocessors
Our research addresses the general topic of atomic update of shared data structures on large-scale shared-memory multiprocessors. In this paper we consider alternative implementations of the general-purpose single-address atomic primitives fetch and , compare and swap, load linked, and store conditional. These primitives have proven popular on small-scale bus-based machines, but have yet to bec...
متن کاملScalability of Atomic Primitives on Distributed Shared Memory Multiprocessors
Many hardware primitives have been proposed for synchronization and atomic memory update on shared-memory multiprocessors. In this paper, we focus on general-purpose primitives that have proven popular on small-scale bus-based machines, but have yet to become widely available on large-scale, distributed-memory machines. Specifically , we propose several alternative implementations of fetch and ...
متن کاملA Tight Bound on Time Complexity of Mutual Exclusion
In distributed shared memory multiprocessors, remote memory accesses generate processor-tomemory traffic which may result in a bottleneck. It is therefore important to design algorithms that minimize the number of remote memory accesses. We establish a lower bound of 3 on remote access time complexity for mutual exclusion algorithms in a model where processes communicate by means of a general r...
متن کاملPractical Considerations for Non - Blocking Concurrent
An important class of concurrent objects are those that are non-blocking, that is, whose operations are not contained within mutually exclusive critical sections. A non-blocking object can be accessed by many threads at a time, yet update protocols based on atomic Compare-And-Swap operations can be used to guarantee the object's consistency. In this paper we take a practical look at the Compare...
متن کاملSystem Software Support for Reducing Memory Latency on Distributed Shared Memory Multiprocessors
This paper overviews results from our recent work on building customized system software support for Distributed Shared Memory Multiprocessors. The mechanisms and policies outlined in this paper are connected with a single conceptual thread: they all attempt to reduce the memory latency of parallel programs by optimizing critical system services, while hiding the complex architectural details o...
متن کامل